Search CORE

98 research outputs found

Finding Mutated Subnetworks Associated with Survival in Cancer

Author: Hansen Tommy
Vandin Fabio
Publication venue
Publication date: 01/01/2016
Field of study

Next-generation sequencing technologies allow the measurement of somatic mutations in a large number of patients from the same cancer type. One of the main goals in analyzing these mutations is the identification of mutations associated with clinical parameters, such as survival time. This goal is hindered by the genetic heterogeneity of mutations in cancer, due to the fact that genes and mutations act in the context of pathways. To identify mutations associated with survival time it is therefore crucial to study mutations in the context of interaction networks. In this work we study the problem of identifying subnetworks of a large gene-gene interaction network that have mutations associated with survival. We formally define the associated computational problem by using a score for subnetworks based on the test statistic of the log-rank test, a widely used statistical test for comparing the survival of two populations. We show that the computational problem is NP-hard and we propose a novel algorithm, called Network of Mutations Associated with Survival (NoMAS), to solve it. NoMAS is based on the color-coding technique, that has been previously used in other applications to find the highest scoring subnetwork with high probability when the subnetwork score is additive. In our case the score is not additive; nonetheless, we prove that under a reasonable model for mutations in cancer NoMAS does identify the optimal solution with high probability. We test NoMAS on simulated and cancer data, comparing it to approaches based on single gene tests and to various greedy approaches. We show that our method does indeed find the optimal solution and performs better than the other approaches. Moreover, on two cancer datasets our method identifies subnetworks with significant association to survival when none of the genes has significant association with survival when considered in isolation.Comment: This paper was selected for oral presentation at RECOMB 2016 and an abstract is published in the conference proceeding

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Finding the True Frequent Itemsets

Author: Riondato Matteo
Vandin Fabio
Publication venue
Publication date: 01/01/2013
Field of study

Frequent Itemsets (FIs) mining is a fundamental primitive in data mining. It requires to identify all itemsets appearing in at least a fraction

\theta

of a transactional dataset

\mathcal{D}

. Often though, the ultimate goal of mining

\mathcal{D}

is not an analysis of the dataset \emph{per se}, but the understanding of the underlying process that generated it. Specifically, in many applications

\mathcal{D}

is a collection of samples obtained from an unknown probability distribution

\pi

on transactions, and by extracting the FIs in

\mathcal{D}

one attempts to infer itemsets that are frequently (i.e., with probability at least

\theta

) generated by

\pi

, which we call the True Frequent Itemsets (TFIs). Due to the inherently stochastic nature of the generative process, the set of FIs is only a rough approximation of the set of TFIs, as it often contains a huge number of \emph{false positives}, i.e., spurious itemsets that are not among the TFIs. In this work we design and analyze an algorithm to identify a threshold

\hat{\theta}

such that the collection of itemsets with frequency at least

\hat{\theta}

\mathcal{D}

contains only TFIs with probability at least

1-\delta

, for some user-specified

\delta

. Our method uses results from statistical learning theory involving the (empirical) VC-dimension of the problem at hand. This allows us to identify almost all the TFIs without including any false positive. We also experimentally compare our method with the direct mining of

\mathcal{D}

at frequency

\theta

and with techniques based on widely-used standard bounds (i.e., the Chernoff bounds) of the binomial distribution, and show that our algorithm outperforms these methods and achieves even better results than what is guaranteed by the theoretical analysis.Comment: 13 pages, Extended version of work appeared in SIAM International Conference on Data Mining, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Archivio istituzionale della ricerca - Università di Padova

Counterpart semantics for a second-order mu-calculus

Author: Gadducci Fabio
Lluch-Lafuente Alberto
Vandin Andrea
Publication venue: Springer
Publication date: 01/01/2010
Field of study

We propose a novel approach to the semantics of quantified μ-calculi, considering models where states are algebras; the evolution relation is given by a counterpart relation (a family of partial homomorphisms), allowing for the creation, deletion, and merging of components; and formulas are interpreted over sets of state assignments (families of substitutions, associating formula variables to state components). Our proposal avoids the limitations of existing approaches, usually enforcing restrictions of the evolution relation: the resulting semantics is a streamlined and intuitively appealing one, yet it is general enough to cover most of the alternative proposals we are aware of

Archivio della Ricerca - Università di Pisa

Archivio della ricerca della Scuola Superiore Sant'Anna

IMT Institutional Repository

Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation

Author: Buffelli Davide
Vandin Fabio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. HAR has attracted major interest in the past few years, thanks to the large number of applications enabled by modern ubiquitous computing devices. While several techniques based on hand-crafted feature engineering have been proposed, the current state-of-the-art is represented by deep learning architectures that automatically obtain high level representations and that use recurrent neural networks (RNNs) to extract temporal dependencies in the input. RNNs have several limitations, in particular in dealing with long-term dependencies. We propose a novel deep learning framework, \algname, based on a purely attention-based mechanism, that overcomes the limitations of the state-of-the-art. We show that our proposed attention-based architecture is considerably more powerful than previous approaches, with an average increment, of more than

7\%

on the F1 score over the previous best performing model. Furthermore, we consider the problem of personalizing HAR deep learning models, which is of great importance in several applications. We propose a simple and effective transfer-learning based strategy to adapt a model to a specific user, providing an average increment of

6\%

on the F1 score on the predictions for that user. Our extensive experimental evaluation proves the significantly superior capabilities of our proposed framework over the current state-of-the-art and the effectiveness of our user adaptation technique.Comment: Accepted for publication on the IEEE Sensors Journa

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

PRESTO: Simple and Scalable Sampling Techniques for the Rigorous Approximation of Temporal Motif Counts

Author: Sarpe Ilie
Vandin Fabio
Publication venue
Publication date: 01/01/2021
Field of study

The identification and counting of small graph patterns, called network motifs, is a fundamental primitive in the analysis of networks, with application in various domains, from social networks to neuroscience. Several techniques have been designed to count the occurrences of motifs in static networks, with recent work focusing on the computational challenges provided by large networks. Modern networked datasets contain rich information, such as the time at which the events modeled by the networks edges happened, which can provide useful insights into the process modeled by the network. The analysis of motifs in temporal networks, called temporal motifs, is becoming an important component in the analysis of modern networked datasets. Several methods have been recently designed to count the number of instances of temporal motifs in temporal networks, which is even more challenging than its counterpart for static networks. Such methods are either exact, and not applicable to large networks, or approximate, but provide only weak guarantees on the estimates they produce and do not scale to very large networks. In this work we present an efficient and scalable algorithm to obtain rigorous approximations of the count of temporal motifs. Our algorithm is based on a simple but effective sampling approach, which renders our algorithm practical for very large datasets. Our extensive experimental evaluation shows that our algorithm provides estimates of temporal motif counts which are more accurate than the state-of-the-art sampling algorithms, with significantly lower running time than exact approaches, enabling the study of temporal motifs, of size larger than the ones considered in previous works, on billion edges networks.Comment: 19 pages, 5 figures, to appear in SDM 202

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Efficient algorithms to discover alterations with complementary functional association in cancer

Author: Basso Rebecca Sarto
Hochbaum Dorit S.
Vandin Fabio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/03/2018
Field of study

Recent large cancer studies have measured somatic alterations in an unprecedented number of tumours. These large datasets allow the identification of cancer-related sets of genetic alterations by identifying relevant combinatorial patterns. Among such patterns, mutual exclusivity has been employed by several recent methods that have shown its effectivenes in characterizing gene sets associated to cancer. Mutual exclusivity arises because of the complementarity, at the functional level, of alterations in genes which are part of a group (e.g., a pathway) performing a given function. The availability of quantitative target profiles, from genetic perturbations or from clinical phenotypes, provides additional information that can be leveraged to improve the identification of cancer related gene sets by discovering groups with complementary functional associations with such targets. In this work we study the problem of finding groups of mutually exclusive alterations associated with a quantitative (functional) target. We propose a combinatorial formulation for the problem, and prove that the associated computation problem is computationally hard. We design two algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. We show that on one such dataset from project Achilles our methods identify several significant gene sets with complementary functional associations with targets.Comment: Accepted at RECOMB 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

FigShare

Are Graph Convolutional Networks Fully Exploiting Graph Structure?

Author: Buffelli Davide
Vandin Fabio
Publication venue
Publication date: 06/06/2020
Field of study

Graph Convolutional Networks (GCNs) generalize the idea of deep convolutional networks to graphs, and achieve state-of-the-art results on many graph related tasks. GCNs rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. In this paper we formalize four levels of structural information injection, and use them to show that GCNs ignore important long-range dependencies embedded in the overall topology of a graph. Our proposal includes a novel regularization technique based on random walks with restart, called RWRReg, which encourages the network to encode long-range information into the node embeddings. RWRReg is further supported by our theoretical analysis, which demonstrates that random walks with restart empower aggregation-based strategies (i.e., the Weisfeiler-Leman algorithm) with long-range information. We conduct an extensive experimental analysis studying the change in performance of several state-of-the-art models given by the four levels of structural information injection, on both transductive and inductive tasks. The results show that the lack of long-range structural information greatly affects performance on all considered models, and that the information extracted by random walks with restart, and exploited by RWRReg, gives an average accuracy improvement of more than

5\%

on all considered tasks

arXiv.org e-Print Archive

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

Modelling and analyzing adaptive self-assembling strategies with Maude

Author: Bruni Roberto
Corradini Andrea
Gadducci Fabio
Lluch-Lafuente Alberto
Vandin Andrea
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Building adaptive systems with predictable emergent behavior is a challenging task and it is becoming a critical need. The research community has accepted the challenge by introducing approaches of various nature: from software architectures, to programming paradigms, to analysis techniques. We recently proposed a conceptual framework for adaptation centered around the role of control data. In this paper we show that it can be naturally realized in a reflective logical language like Maude by using the Reflective Russian Dolls model. Moreover, we exploit this model to specify, validate and analyse a prominent example of adaptive system: robot swarms equipped with self-assembly strategies. The analysis exploits the statistical model checker PVeStA

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca della Scuola Superiore Sant'Anna

IMT Institutional Repository

Adaptation is a Game

Author: Bruni Roberto
Corradini Andrea
Gadducci Fabio
Lluch-Lafuente Alberto
Vandin Andrea
Publication venue
Publication date: 01/01/2013
Field of study

Control data variants of game models such as Interface Automata are suitable for the design and analysis of self-adaptive systems

IMT Institutional Repository